Skip to content

fix: fall back to CUDA events when CUPTI driver version < 13.0#2818

Open
sha7doww wants to merge 4 commits intoflashinfer-ai:mainfrom
sha7doww:fix/cupti-driver-version-fallback
Open

fix: fall back to CUDA events when CUPTI driver version < 13.0#2818
sha7doww wants to merge 4 commits intoflashinfer-ai:mainfrom
sha7doww:fix/cupti-driver-version-fallback

Conversation

@sha7doww
Copy link

@sha7doww sha7doww commented Mar 19, 2026

Description

On systems where cupti-python >= 13 is installed but the CUDA driver is older than 13.0, cupti.activity_enable() raises NotSupportedError. Previously this call happened after the try block, so the exception was unhandled and bench_gpu_time_with_cupti crashed instead of falling back to CUDA events.

Root cause: The driver-support check was missing from the guarded import block.

Fix: Added a probe (activity_enable + activity_disable on RUNTIME) inside the existing try block so the NotSupportedError triggers the existing CUDA-event fallback path.

Related Issues

Fixes a crash when using bench_gpu_time(enable_cupti=True) on machines with CUDA driver < 13.0 but cupti-python >= 13 installed.

Pre-commit

  • pre-commit run --all-files passes (all 14 hooks green)

Tests

  • Added tests/utils/test_cupti_fallback.py — mocks cupti.activity_enable to raise an exception and verifies:
    • bench_gpu_time_with_cupti does not crash
    • Returns valid timing results via the CUDA-event fallback
    • Emits the expected UserWarning about falling back
$ pytest tests/utils/test_cupti_fallback.py -v
PASSED

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved GPU benchmarking robustness by verifying both CUPTI version and runtime driver support before use; falls back to alternative CUDA benchmarking when CUPTI is unavailable or activity enabling fails, emitting a warning.
  • Tests

    • Added a test validating CUPTI fallback behavior, ensuring timing results remain positive and a fallback warning is issued when CUPTI activity enabling errors.

sha7doww and others added 2 commits March 19, 2026 14:46
Probe CUPTI activity tracing support inside the existing try block
so that NotSupportedError on older drivers is caught by the existing
fallback logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that bench_gpu_time_with_cupti gracefully falls back to CUDA
events when cupti.activity_enable raises (e.g. CUDA driver < 13.0).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue in the bench_gpu_time_with_cupti utility, preventing crashes on systems with newer cupti-python versions but older CUDA drivers. By introducing an early driver compatibility check, the system now gracefully falls back to CUDA events for GPU timing, enhancing robustness and user experience without requiring specific driver updates.

Highlights

  • bench_gpu_time_with_cupti crash fix: Resolved a crash in bench_gpu_time_with_cupti that occurred when cupti-python >= 13 was installed but the CUDA driver was older than 13.0, due to cupti.activity_enable() being called outside the guarded try block.
  • Driver support probe: Implemented an early driver support probe within the try block of bench_gpu_time_with_cupti using cupti.activity_enable() and cupti.activity_disable(). This ensures NotSupportedError is caught, triggering the existing CUDA-event fallback.
  • New test for CUPTI fallback: Added tests/utils/test_cupti_fallback.py to specifically test and verify the graceful fallback to CUDA events when CUPTI activity enablement fails, ensuring the function does not crash and provides valid timing results.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

The CUPTI availability check in the GPU benchmarking utility now requires both a cupti-python major version ≥13 and a successful runtime probe via cupti.activity_enable(...)/cupti.activity_disable(...); failures fall back to CUDA event/graph benchmarking. A new CUDA-gated test verifies graceful fallback when cupti.activity_enable raises an error.

Changes

Cohort / File(s) Summary
CUPTI availability check
flashinfer/testing/utils.py
bench_gpu_time_with_cupti now performs a runtime driver probe (cupti.activity_enable / cupti.activity_disable) after the version check and treats failures as lack of CUPTI support, falling back to CUDA event/graph timing.
CUPTI fallback test
tests/utils/test_cupti_fallback.py
New CUDA-gated test test_cupti_fallback_on_activity_enable_error that mocks the cupti import and version, forces activity_enable to raise, captures warnings, and asserts fallback timings and a "Falling back" UserWarning.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • bkryu
  • cyx-6
  • kahyunnam
  • jimmyzho
  • nv-yunzheq

Poem

🐰 I poked at CUPTI with a curious snoop,
It coughed a shrug — I hopped to a loop.
Timings kept steady, not a single flop,
I bounced on CUDA and finished on top.
A whisker twitch, benchmarks won't stop.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main fix: implementing a fallback to CUDA events when CUPTI driver version is below 13.0, which matches the primary change in the changeset.
Description check ✅ Passed The description comprehensively covers what the PR does, the root cause, the fix, related issues, pre-commit verification, and test additions following the repository template structure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a crash when using bench_gpu_time_with_cupti on systems with an older CUDA driver but a newer cupti-python library. The fix involves probing for driver support within the try...except block to trigger the existing fallback mechanism. A comprehensive test case has been added to ensure the fallback to CUDA events works as expected when cupti.activity_enable raises an exception. The changes are logical and well-tested. I have one minor suggestion to improve code clarity in the new test file.

fake_module = MagicMock()
fake_module.cupti = fake_cupti

real_import = __builtins__.__import__ if hasattr(__builtins__, "__import__") else __import__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line is overly complex for getting a reference to the built-in __import__ function. In standard Python 3 environments, __import__ is directly available in the global scope. You can simplify this for better readability and maintainability.

Suggested change
real_import = __builtins__.__import__ if hasattr(__builtins__, "__import__") else __import__
real_import = __import__

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/utils/test_cupti_fallback.py (1)

10-45: Prefer bench_gpu_time(..., enable_cupti=True) in this test.

Line 39 currently tests the helper directly; switching to the unified benchmarking entrypoint better matches repository test guidance while still exercising the same fallback behavior.

♻️ Suggested update
-from flashinfer.testing import bench_gpu_time_with_cupti
+from flashinfer.testing import bench_gpu_time
@@
-            times = bench_gpu_time_with_cupti(
+            times = bench_gpu_time(
                 fn=torch.matmul,
                 input_args=(a, b),
                 repeat_iters=5,
                 dry_run_iters=2,
                 cold_l2_cache=False,
+                enable_cupti=True,
             )

As per coding guidelines tests/**/*.py: Use flashinfer.testing.bench_gpu_time() for benchmarking kernels, preferring CUPTI timing with auto-fallback to CUDA events.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_cupti_fallback.py` around lines 10 - 45, The test currently
calls the internal helper bench_gpu_time_with_cupti; replace that call with the
public unified entrypoint bench_gpu_time(..., enable_cupti=True) so the test
exercises the same CUPTI fallback via the sanctioned API. Specifically, in
test_cupti_fallback_on_activity_enable_error swap the call to
bench_gpu_time_with_cupti(fn=torch.matmul, input_args=(a, b), repeat_iters=5,
dry_run_iters=2, cold_l2_cache=False) for a call to
flashinfer.testing.bench_gpu_time(fn=torch.matmul, input_args=(a, b),
repeat_iters=5, dry_run_iters=2, cold_l2_cache=False, enable_cupti=True) (or
import bench_gpu_time and call bench_gpu_time(..., enable_cupti=True)); keep the
same patches/mocks and warning capture around it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/utils/test_cupti_fallback.py`:
- Around line 10-45: The test currently calls the internal helper
bench_gpu_time_with_cupti; replace that call with the public unified entrypoint
bench_gpu_time(..., enable_cupti=True) so the test exercises the same CUPTI
fallback via the sanctioned API. Specifically, in
test_cupti_fallback_on_activity_enable_error swap the call to
bench_gpu_time_with_cupti(fn=torch.matmul, input_args=(a, b), repeat_iters=5,
dry_run_iters=2, cold_l2_cache=False) for a call to
flashinfer.testing.bench_gpu_time(fn=torch.matmul, input_args=(a, b),
repeat_iters=5, dry_run_iters=2, cold_l2_cache=False, enable_cupti=True) (or
import bench_gpu_time and call bench_gpu_time(..., enable_cupti=True)); keep the
same patches/mocks and warning capture around it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e79249fd-ec1c-45bd-8cbe-6aac24c0fd5d

📥 Commits

Reviewing files that changed from the base of the PR and between fc4e70f and 29e0f26.

📒 Files selected for processing (2)
  • flashinfer/testing/utils.py
  • tests/utils/test_cupti_fallback.py

- Use bench_gpu_time(enable_cupti=True) public API instead of
  bench_gpu_time_with_cupti directly (CodeRabbit suggestion)
- Combine nested with statements (ruff SIM117)
- Add docstrings to inner helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bkryu can you help review?

Address Gemini review suggestion — no need for __builtins__ check
in standard Python 3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/utils/test_cupti_fallback.py (1)

13-14: This test does not have GPU architecture-specific requirements—it only requires CUDA to be available. Given that FlashInfer assumes CUDA is available in test environments, the skip decorator may be unnecessary. If retaining it for defensive robustness, align with the project's approach: either use architecture checks from flashinfer.utils (when architecture-specific support is actually required) or use get_compute_capability() to query device properties. However, for a general "CUDA required" check without architecture constraints, this simple guard is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/utils/test_cupti_fallback.py` around lines 13 - 14, Remove the
pytest.skipif decorator on test_cupti_fallback_on_activity_enable_error because
the test only needs CUDA availability and the project assumes CUDA in CI; simply
rely on the existing environment assumption, or if you prefer a defensive check,
replace the decorator with a project helper such as
flashinfer.utils.get_compute_capability() or a
has_cuda()/get_compute_capability() call to gate the test instead of using
pytest.mark.skipif(not torch.cuda.is_available()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/utils/test_cupti_fallback.py`:
- Around line 13-14: Remove the pytest.skipif decorator on
test_cupti_fallback_on_activity_enable_error because the test only needs CUDA
availability and the project assumes CUDA in CI; simply rely on the existing
environment assumption, or if you prefer a defensive check, replace the
decorator with a project helper such as
flashinfer.utils.get_compute_capability() or a
has_cuda()/get_compute_capability() call to gate the test instead of using
pytest.mark.skipif(not torch.cuda.is_available()).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0212a953-4539-44d0-bf20-7a3604e1c6e3

📥 Commits

Reviewing files that changed from the base of the PR and between 2cd7e5d and bb2ee4a.

📒 Files selected for processing (1)
  • tests/utils/test_cupti_fallback.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants